98 research outputs found
Learning When to Concentrate or Divert Attention: Self-Adaptive Attention Temperature for Neural Machine Translation
Most of the Neural Machine Translation (NMT) models are based on the
sequence-to-sequence (Seq2Seq) model with an encoder-decoder framework equipped
with the attention mechanism. However, the conventional attention mechanism
treats the decoding at each time step equally with the same matrix, which is
problematic since the softness of the attention for different types of words
(e.g. content words and function words) should differ. Therefore, we propose a
new model with a mechanism called Self-Adaptive Control of Temperature (SACT)
to control the softness of attention by means of an attention temperature.
Experimental results on the Chinese-English translation and English-Vietnamese
translation demonstrate that our model outperforms the baseline models, and the
analysis and the case study show that our model can attend to the most relevant
elements in the source-side contexts and generate the translation of high
quality.Comment: To appear in EMNLP 201
- …